Overview

Dataset statistics

Number of variables12
Number of observations678013
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory44.0 MiB
Average record size in memory68.0 B

Variable types

Numeric8
Categorical4

Alerts

DrivAge is highly correlated with BonusMalusHigh correlation
BonusMalus is highly correlated with DrivAgeHigh correlation
Area is highly correlated with Density and 1 other fieldsHigh correlation
DrivAge is highly correlated with BonusMalusHigh correlation
BonusMalus is highly correlated with DrivAgeHigh correlation
Density is highly correlated with Area and 1 other fieldsHigh correlation
Region is highly correlated with Area and 1 other fieldsHigh correlation
IDpol has unique values Unique
ClaimNb has 643953 (95.0%) zeros Zeros
VehAge has 57739 (8.5%) zeros Zeros

Reproduction

Analysis started2022-03-02 08:16:41.898343
Analysis finished2022-03-02 08:17:13.176074
Duration31.28 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

IDpol
Real number (ℝ≥0)

UNIQUE

Distinct678013
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2621856.921
Minimum1
Maximum6114330
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.2 MiB
2022-03-02T08:17:13.409055image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile69365.6
Q11157951
median2272152
Q34046274
95-th percentile6014195.2
Maximum6114330
Range6114329
Interquartile range (IQR)2888323

Descriptive statistics

Standard deviation1641782.753
Coefficient of variation (CV)0.6261908266
Kurtosis-0.6583474996
Mean2621856.921
Median Absolute Deviation (MAD)1152062
Skewness0.2378901399
Sum1.777653077 × 1012
Variance2.695450607 × 1012
MonotonicityStrictly increasing
2022-03-02T08:17:13.511373image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21262601
 
< 0.1%
646131
 
< 0.1%
11003301
 
< 0.1%
32510831
 
< 0.1%
521311
 
< 0.1%
31601841
 
< 0.1%
31492561
 
< 0.1%
20175781
 
< 0.1%
30172211
 
< 0.1%
32407921
 
< 0.1%
Other values (678003)678003
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
31
< 0.1%
51
< 0.1%
101
< 0.1%
111
< 0.1%
131
< 0.1%
151
< 0.1%
171
< 0.1%
181
< 0.1%
211
< 0.1%
ValueCountFrequency (%)
61143301
< 0.1%
61143291
< 0.1%
61143281
< 0.1%
61143271
< 0.1%
61143261
< 0.1%
61143251
< 0.1%
61143241
< 0.1%
61143231
< 0.1%
61143221
< 0.1%
61143211
< 0.1%

ClaimNb
Real number (ℝ≥0)

ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05324676665
Minimum0
Maximum16
Zeros643953
Zeros (%)95.0%
Negative0
Negative (%)0.0%
Memory size5.2 MiB
2022-03-02T08:17:13.603385image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum16
Range16
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2401173304
Coefficient of variation (CV)4.509519461
Kurtosis76.84187999
Mean0.05324676665
Median Absolute Deviation (MAD)0
Skewness5.599613312
Sum36102
Variance0.05765633238
MonotonicityNot monotonic
2022-03-02T08:17:13.696463image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
0643953
95.0%
132178
 
4.7%
21784
 
0.3%
382
 
< 0.1%
47
 
< 0.1%
113
 
< 0.1%
52
 
< 0.1%
61
 
< 0.1%
81
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
0643953
95.0%
132178
 
4.7%
21784
 
0.3%
382
 
< 0.1%
47
 
< 0.1%
52
 
< 0.1%
61
 
< 0.1%
81
 
< 0.1%
91
 
< 0.1%
113
 
< 0.1%
ValueCountFrequency (%)
161
 
< 0.1%
113
 
< 0.1%
91
 
< 0.1%
81
 
< 0.1%
61
 
< 0.1%
52
 
< 0.1%
47
 
< 0.1%
382
 
< 0.1%
21784
 
0.3%
132178
4.7%

Exposure
Real number (ℝ≥0)

Distinct187
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5287501058
Minimum0.00273224
Maximum2.01
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.2 MiB
2022-03-02T08:17:13.795732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.00273224
5-th percentile0.04
Q10.18
median0.49
Q30.99
95-th percentile1
Maximum2.01
Range2.00726776
Interquartile range (IQR)0.81

Descriptive statistics

Standard deviation0.3644415463
Coefficient of variation (CV)0.6892510136
Kurtosis-1.524243691
Mean0.5287501058
Median Absolute Deviation (MAD)0.37
Skewness0.08531780026
Sum358499.4455
Variance0.1328176407
MonotonicityNot monotonic
2022-03-02T08:17:13.895326image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1168125
24.8%
0.0844670
 
6.6%
0.0712969
 
1.9%
0.2412950
 
1.9%
0.512497
 
1.8%
0.4912298
 
1.8%
0.0311996
 
1.8%
0.0411131
 
1.6%
0.1211047
 
1.6%
0.28727
 
1.3%
Other values (177)371603
54.8%
ValueCountFrequency (%)
0.00273224295
 
< 0.1%
0.002732240437765
0.1%
0.002739726312
 
< 0.1%
0.0027397260271733
0.3%
0.005464480874464
 
0.1%
0.005464481145
 
< 0.1%
0.005479452355
 
0.1%
0.0054794520551041
0.2%
0.008196721113
 
< 0.1%
0.008196721311507
 
0.1%
ValueCountFrequency (%)
2.012
< 0.1%
21
< 0.1%
1.991
< 0.1%
1.981
< 0.1%
1.931
< 0.1%
1.921
< 0.1%
1.92
< 0.1%
1.881
< 0.1%
1.852
< 0.1%
1.821
< 0.1%

Area
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size662.5 KiB
C
191880 
D
151596 
E
137167 
A
103957 
B
75459 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowD
2nd rowD
3rd rowB
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
C191880
28.3%
D151596
22.4%
E137167
20.2%
A103957
15.3%
B75459
 
11.1%
F17954
 
2.6%

Length

2022-03-02T08:17:14.000711image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-02T08:17:14.058788image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
c191880
28.3%
d151596
22.4%
e137167
20.2%
a103957
15.3%
b75459
 
11.1%
f17954
 
2.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

VehPower
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.454631401
Minimum4
Maximum15
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.2 MiB
2022-03-02T08:17:14.145742image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile4
Q15
median6
Q37
95-th percentile11
Maximum15
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.050905698
Coefficient of variation (CV)0.3177417222
Kurtosis1.668206924
Mean6.454631401
Median Absolute Deviation (MAD)1
Skewness1.17134444
Sum4376324
Variance4.206214181
MonotonicityNot monotonic
2022-03-02T08:17:14.216832image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
6148976
22.0%
7145401
21.4%
5124821
18.4%
4115349
17.0%
846956
 
6.9%
1031354
 
4.6%
930085
 
4.4%
1118352
 
2.7%
128214
 
1.2%
133229
 
0.5%
Other values (2)5276
 
0.8%
ValueCountFrequency (%)
4115349
17.0%
5124821
18.4%
6148976
22.0%
7145401
21.4%
846956
 
6.9%
930085
 
4.4%
1031354
 
4.6%
1118352
 
2.7%
128214
 
1.2%
133229
 
0.5%
ValueCountFrequency (%)
152926
 
0.4%
142350
 
0.3%
133229
 
0.5%
128214
 
1.2%
1118352
 
2.7%
1031354
 
4.6%
930085
 
4.4%
846956
 
6.9%
7145401
21.4%
6148976
22.0%

VehAge
Real number (ℝ≥0)

ZEROS

Distinct78
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.044264638
Minimum0
Maximum100
Zeros57739
Zeros (%)8.5%
Negative0
Negative (%)0.0%
Memory size5.2 MiB
2022-03-02T08:17:14.311721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median6
Q311
95-th percentile17
Maximum100
Range100
Interquartile range (IQR)9

Descriptive statistics

Standard deviation5.66623158
Coefficient of variation (CV)0.804375172
Kurtosis6.522053975
Mean7.044264638
Median Absolute Deviation (MAD)4
Skewness1.147988998
Sum4776103
Variance32.10618032
MonotonicityNot monotonic
2022-03-02T08:17:14.415054image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
171284
 
10.5%
259124
 
8.7%
057739
 
8.5%
350261
 
7.4%
443492
 
6.4%
538737
 
5.7%
1038395
 
5.7%
635717
 
5.3%
732880
 
4.8%
832680
 
4.8%
Other values (68)217704
32.1%
ValueCountFrequency (%)
057739
8.5%
171284
10.5%
259124
8.7%
350261
7.4%
443492
6.4%
538737
5.7%
635717
5.3%
732880
4.8%
832680
4.8%
931922
4.7%
ValueCountFrequency (%)
10025
< 0.1%
9923
< 0.1%
851
 
< 0.1%
841
 
< 0.1%
832
 
< 0.1%
821
 
< 0.1%
813
 
< 0.1%
803
 
< 0.1%
791
 
< 0.1%
781
 
< 0.1%

DrivAge
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct83
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.4991217
Minimum18
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.2 MiB
2022-03-02T08:17:14.524777image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile25
Q134
median44
Q355
95-th percentile72
Maximum100
Range82
Interquartile range (IQR)21

Descriptive statistics

Standard deviation14.13744407
Coefficient of variation (CV)0.3107190545
Kurtosis-0.3426884401
Mean45.4991217
Median Absolute Deviation (MAD)10
Skewness0.4357585748
Sum30848996
Variance199.867325
MonotonicityNot monotonic
2022-03-02T08:17:14.624084image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3617530
 
2.6%
3817346
 
2.6%
3917320
 
2.6%
3717295
 
2.6%
5217195
 
2.5%
3417059
 
2.5%
4017017
 
2.5%
5117016
 
2.5%
4116977
 
2.5%
4216953
 
2.5%
Other values (73)506305
74.7%
ValueCountFrequency (%)
18748
 
0.1%
192392
 
0.4%
203676
 
0.5%
214437
 
0.7%
225291
0.8%
236261
0.9%
247393
1.1%
258697
1.3%
2610301
1.5%
2711827
1.7%
ValueCountFrequency (%)
1003
 
< 0.1%
9970
< 0.1%
985
 
< 0.1%
9710
 
< 0.1%
9615
 
< 0.1%
9524
 
< 0.1%
9432
 
< 0.1%
9355
< 0.1%
9266
< 0.1%
91121
< 0.1%

BonusMalus
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct115
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean59.76150162
Minimum50
Maximum230
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.2 MiB
2022-03-02T08:17:14.739158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile50
Q150
median50
Q364
95-th percentile95
Maximum230
Range180
Interquartile range (IQR)14

Descriptive statistics

Standard deviation15.63665766
Coefficient of variation (CV)0.2616510167
Kurtosis2.674811214
Mean59.76150162
Median Absolute Deviation (MAD)0
Skewness1.728934068
Sum40519075
Variance244.5050628
MonotonicityNot monotonic
2022-03-02T08:17:14.835844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50384156
56.7%
10019530
 
2.9%
6818791
 
2.8%
7218580
 
2.7%
7618226
 
2.7%
6418192
 
2.7%
8018086
 
2.7%
5717938
 
2.6%
6017363
 
2.6%
5417360
 
2.6%
Other values (105)129791
 
19.1%
ValueCountFrequency (%)
50384156
56.7%
5115869
 
2.3%
524770
 
0.7%
533351
 
0.5%
5417360
 
2.6%
555593
 
0.8%
563453
 
0.5%
5717938
 
2.6%
585970
 
0.9%
592779
 
0.4%
ValueCountFrequency (%)
2301
 
< 0.1%
2281
 
< 0.1%
2181
 
< 0.1%
2081
 
< 0.1%
1982
 
< 0.1%
1963
< 0.1%
1956
< 0.1%
1903
< 0.1%
1873
< 0.1%
1855
< 0.1%

VehBrand
Categorical

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size662.6 KiB
B12
166024 
B1
162736 
B2
159861 
B3
53395 
B5
34753 
Other values (6)
101244 

Length

Max length3
Median length2
Mean length2.314951188
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB12
2nd rowB12
3rd rowB12
4th rowB12
5th rowB12

Common Values

ValueCountFrequency (%)
B12166024
24.5%
B1162736
24.0%
B2159861
23.6%
B353395
 
7.9%
B534753
 
5.1%
B628548
 
4.2%
B425179
 
3.7%
B1017707
 
2.6%
B1113585
 
2.0%
B1312178
 
1.8%

Length

2022-03-02T08:17:14.940677image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b12166024
24.5%
b1162736
24.0%
b2159861
23.6%
b353395
 
7.9%
b534753
 
5.1%
b628548
 
4.2%
b425179
 
3.7%
b1017707
 
2.6%
b1113585
 
2.0%
b1312178
 
1.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

VehGas
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size662.4 KiB
Regular
345877 
Diesel
332136 

Length

Max length7
Median length7
Mean length6.510133287
Min length6

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRegular
2nd rowRegular
3rd rowDiesel
4th rowDiesel
5th rowDiesel

Common Values

ValueCountFrequency (%)
Regular345877
51.0%
Diesel332136
49.0%

Length

2022-03-02T08:17:15.028072image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-02T08:17:15.088300image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
regular345877
51.0%
diesel332136
49.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Density
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1607
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1792.422405
Minimum1
Maximum27000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.2 MiB
2022-03-02T08:17:15.155108image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile20
Q192
median393
Q31658
95-th percentile7313
Maximum27000
Range26999
Interquartile range (IQR)1566

Descriptive statistics

Standard deviation3958.646564
Coefficient of variation (CV)2.208545571
Kurtosis24.86945063
Mean1792.422405
Median Absolute Deviation (MAD)355
Skewness4.65142115
Sum1215285692
Variance15670882.62
MonotonicityNot monotonic
2022-03-02T08:17:15.256840image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2700010515
 
1.6%
33179891
 
1.5%
13137157
 
1.1%
93075986
 
0.9%
37445540
 
0.8%
13265447
 
0.8%
4055195
 
0.8%
41285055
 
0.7%
47624985
 
0.7%
574262
 
0.6%
Other values (1597)613980
90.6%
ValueCountFrequency (%)
17
 
< 0.1%
292
 
< 0.1%
3304
 
< 0.1%
4274
 
< 0.1%
5438
 
0.1%
6752
 
0.1%
71088
 
0.2%
81131
 
0.2%
91813
0.3%
102911
0.4%
ValueCountFrequency (%)
2700010515
1.6%
2339666
 
< 0.1%
22821182
 
< 0.1%
22669463
 
0.1%
2141076
 
< 0.1%
200006
 
< 0.1%
18229200
 
< 0.1%
17140910
 
0.1%
16533613
 
0.1%
16291175
 
< 0.1%

Region
Categorical

HIGH CORRELATION

Distinct22
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size663.0 KiB
R24
160601 
R82
84752 
R93
79315 
R11
69791 
R53
42122 
Other values (17)
241432 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowR82
2nd rowR82
3rd rowR22
4th rowR72
5th rowR72

Common Values

ValueCountFrequency (%)
R24160601
23.7%
R8284752
12.5%
R9379315
11.7%
R1169791
10.3%
R5342122
 
6.2%
R5238751
 
5.7%
R9135805
 
5.3%
R7231329
 
4.6%
R3127285
 
4.0%
R5419046
 
2.8%
Other values (12)89216
13.2%

Length

2022-03-02T08:17:15.502277image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
r24160601
23.7%
r8284752
12.5%
r9379315
11.7%
r1169791
10.3%
r5342122
 
6.2%
r5238751
 
5.7%
r9135805
 
5.3%
r7231329
 
4.6%
r3127285
 
4.0%
r5419046
 
2.8%
Other values (12)89216
13.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2022-03-02T08:17:09.377871image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:52.810517image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:55.179821image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:57.374812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:59.858194image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:02.209346image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:04.715539image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:07.058261image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:09.683833image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:53.103044image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:55.441205image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:57.677076image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:00.141541image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:02.666470image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:05.011098image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:07.342067image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:10.023841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:53.421665image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:55.709274image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:57.991074image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:00.442522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:02.958149image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:05.320834image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:07.644673image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:10.396132image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:53.736307image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:55.980283image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:58.326240image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:00.738048image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:03.254859image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:05.632063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:07.945089image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:10.681117image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:54.018834image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:56.242794image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:58.626178image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:01.028357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:03.544639image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:05.924571image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:08.219227image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:10.960906image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:54.306998image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:56.525351image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:58.937072image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:01.311428image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:03.831405image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:06.208597image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:08.493436image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:11.267064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:54.593171image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:56.777244image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:59.248671image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:01.613277image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:04.113978image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:06.487231image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:08.790937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:11.588948image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:54.921522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:57.070164image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:16:59.570340image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:01.909077image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:04.404014image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:06.793397image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-02T08:17:09.089916image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-03-02T08:17:15.589079image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-03-02T08:17:15.737740image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-03-02T08:17:15.882358image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-03-02T08:17:16.034307image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-03-02T08:17:16.148216image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-03-02T08:17:11.774286image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-03-02T08:17:12.311779image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

IDpolClaimNbExposureAreaVehPowerVehAgeDrivAgeBonusMalusVehBrandVehGasDensityRegion
01.010.10D505550B12Regular1217R82
13.010.77D505550B12Regular1217R82
25.010.75B625250B12Diesel54R22
310.010.09B704650B12Diesel76R72
411.010.84B704650B12Diesel76R72
513.010.52E623850B12Regular3003R31
615.010.45E623850B12Regular3003R31
717.010.27C703368B12Diesel137R91
818.010.71C703368B12Diesel137R91
921.010.15B704150B12Diesel60R52

Last rows

IDpolClaimNbExposureAreaVehPowerVehAgeDrivAgeBonusMalusVehBrandVehGasDensityRegion
6780036114321.000.005479E402980B12Regular5360R11
6780046114322.000.005479E1104974B12Diesel5360R11
6780056114323.000.005479D403480B12Regular731R82
6780066114324.000.005479D1104150B12Diesel528R93
6780076114325.000.005479E644068B12Regular2733R93
6780086114326.000.002740E405450B12Regular3317R93
6780096114327.000.002740E404195B12Regular9850R11
6780106114328.000.002740D624550B12Diesel1323R82
6780116114329.000.002740B406050B12Regular95R26
6780126114330.000.002740B762954B12Diesel65R72